ORDERED_LOGIT

Overview

The ORDERED_LOGIT function fits an ordered logistic regression model (also known as the proportional odds model) for ordinal dependent variables. This type of regression is appropriate when the outcome has naturally ordered categories—such as survey responses ranging from “strongly disagree” to “strongly agree,” bond ratings, or health status levels—where the ordering matters but the intervals between categories are not assumed to be equal.

Ordered logit is based on a latent variable framework. The model assumes an unobserved continuous variable y^* underlies the observed categorical responses:

y^* = X\beta + \varepsilon

where X represents the predictor variables, \beta are the regression coefficients, and \varepsilon follows a standard logistic distribution. The observed ordinal outcome y is determined by where y^* falls relative to a set of cut points (thresholds) \mu_1, \mu_2, \ldots, \mu_{K-1} for K categories:

y = k \quad \text{if} \quad \mu_{k-1} < y^* \leq \mu_k

The probability of observing category k is:

P(y = k | X) = F(\mu_k - X\beta) - F(\mu_{k-1} - X\beta)

where F is the cumulative distribution function of the logistic distribution.

This implementation uses the OrderedModel class from the statsmodels library. The function returns coefficient estimates for each predictor along with cut points that separate the ordered categories, standard errors, z-statistics, p-values, and confidence intervals. Model fit statistics include the pseudo R-squared, log-likelihood, AIC, and BIC.

Common applications include analyzing Likert-scale survey data, credit ratings, educational attainment levels, and any scenario where outcomes fall into ranked categories. For theoretical background, see the Wikipedia article on ordered logit and the original work by McCullagh (1980).

This example function is provided as-is without any representation of accuracy.

Excel Usage

=ORDERED_LOGIT(y, x, fit_intercept, alpha)
  • y (list[list], required): Ordinal dependent variable as a column vector with integer category values (0, 1, 2, …) representing ordered categories.
  • x (list[list], required): Independent variables (predictors) as a matrix where each column represents a different predictor variable.
  • fit_intercept (bool, optional, default: true): Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals, between 0 and 1.

Returns (list[list]): 2D list with ordered logit results, or error string.

Examples

Example 1: Basic three-category model with one predictor

Inputs:

y x
0 1
0 1.2
0 1.4
0 1.6
0 1.8
0 2
1 1.8
1 2
1 2.2
1 2.4
1 2.6
1 2.8
1 3
1 3.2
2 2.8
2 3
2 3.2
2 3.4
2 3.6
2 3.8

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8})

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.099 2.441 2.498 0.01248 1.314 10.88
cut_1/2 11.58 4.742 2.442 0.01462 2.284 20.87
x0 1.906 0.4324 4.408 0.00001043 1.059 2.753
pseudo_r_squared 0.6101
log_likelihood -8.49
aic 22.98
bic 25.97

Example 2: Model without intercept using same data

Inputs:

y x fit_intercept
0 1 false
0 1.2
0 1.4
0 1.6
0 1.8
0 2
1 1.8
1 2
1 2.2
1 2.4
1 2.6
1 2.8
1 3
1 3.2
2 2.8
2 3
2 3.2
2 3.4
2 3.6
2 3.8

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8}, FALSE)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.099 2.441 2.498 0.01248 1.314 10.88
cut_1/2 11.58 4.742 2.442 0.01462 2.284 20.87
x0 1.906 0.4324 4.408 0.00001043 1.059 2.753
pseudo_r_squared 0.6101
log_likelihood -8.49
aic 22.98
bic 25.97

Example 3: Custom significance level (90% CI)

Inputs:

y x alpha
0 1 0.1
0 1.2
0 1.4
0 1.6
0 1.8
0 2
1 1.8
1 2
1 2.2
1 2.4
1 2.6
1 2.8
1 3
1 3.2
2 2.8
2 3
2 3.2
2 3.4
2 3.6
2 3.8

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8}, 0.1)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.099 2.441 2.498 0.01248 2.084 10.11
cut_1/2 11.58 4.742 2.442 0.01462 3.781 19.37
x0 1.906 0.4324 4.408 0.00001043 1.194 2.617
pseudo_r_squared 0.6101
log_likelihood -8.49
aic 22.98
bic 25.97

Example 4: Multiple predictors with all arguments specified

Inputs:

y x fit_intercept alpha
0 1 1 true 0.05
0 1.2 0.9
0 1.4 1.1
0 1.6 0.8
0 1.8 1.2
0 2 0.7
1 1.8 1.3
1 2 1.4
1 2.2 0.9
1 2.4 1.5
1 2.6 1
1 2.8 1.6
1 3 1.1
1 3.2 1.7
2 2.8 1.8
2 3 1.2
2 3.2 1.9
2 3.4 1.3
2 3.6 2
2 3.8 1.4

Excel formula:

=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1,1;1.2,0.9;1.4,1.1;1.6,0.8;1.8,1.2;2,0.7;1.8,1.3;2,1.4;2.2,0.9;2.4,1.5;2.6,1;2.8,1.6;3,1.1;3.2,1.7;2.8,1.8;3,1.2;3.2,1.9;3.4,1.3;3.6,2;3.8,1.4}, TRUE, 0.05)

Expected output:

parameter coefficient std_error z_statistic p_value ci_lower ci_upper
cut_0/1 6.216 2.927 2.124 0.03369 0.4794 11.95
cut_1/2 3.659 2.555 1.432 0.1521 -1.349 8.667
x0 15.78 6.85 2.304 0.02124 2.355 29.21
x1 2.111 0.4512 4.68 0.000002872 1.227 2.996
pseudo_r_squared 0.6677
log_likelihood -7.236
aic 22.47
bic 26.46

Python Code

import math
import numpy as np
from statsmodels.miscmodels.ordinal_model import OrderedModel as statsmodels_ordered_model

def ordered_logit(y, x, fit_intercept=True, alpha=0.05):
    """
    Fits an ordered logistic regression model for ordinal outcomes.

    See: https://www.statsmodels.org/stable/generated/statsmodels.miscmodels.ordinal_model.OrderedModel.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Ordinal dependent variable as a column vector with integer category values (0, 1, 2, ...) representing ordered categories.
        x (list[list]): Independent variables (predictors) as a matrix where each column represents a different predictor variable.
        fit_intercept (bool, optional): Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts. Default is True.
        alpha (float, optional): Significance level for confidence intervals, between 0 and 1. Default is 0.05.

    Returns:
        list[list]: 2D list with ordered logit results, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_numeric(val, name):
        if not isinstance(val, (int, float)):
            return f"Invalid input: {name} must be a number."
        if math.isnan(val) or math.isinf(val):
            return f"Invalid input: {name} must be finite."
        return None

    # Normalize inputs
    y = to2d(y)
    x = to2d(x)

    # Validate y is a column vector
    if not isinstance(y, list) or len(y) == 0:
        return "Invalid input: y must be a non-empty 2D list."
    if not all(isinstance(row, list) and len(row) == 1 for row in y):
        return "Invalid input: y must be a column vector (2D list with one column)."

    # Validate x is a matrix
    if not isinstance(x, list) or len(x) == 0:
        return "Invalid input: x must be a non-empty 2D list."
    if not all(isinstance(row, list) for row in x):
        return "Invalid input: x must be a 2D list."

    num_rows_x = len(x)
    num_cols_x = len(x[0]) if num_rows_x > 0 else 0
    if num_cols_x == 0:
        return "Invalid input: x must have at least one column."
    if not all(len(row) == num_cols_x for row in x):
        return "Invalid input: x must have consistent row lengths."

    # Check y and x have same number of rows
    if len(y) != num_rows_x:
        return "Invalid input: y and x must have the same number of rows."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Validate alpha
    err = validate_numeric(alpha, "alpha")
    if err:
        return err
    if alpha <= 0 or alpha >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Extract y values
    y_flat = []
    for row in y:
        val = row[0]
        err = validate_numeric(val, "y value")
        if err:
            return err
        y_flat.append(val)

    # Check y values are integers
    for val in y_flat:
        if val != int(val):
            return "Invalid input: y must contain integer category values."

    # Extract x values
    x_matrix = []
    for row in x:
        x_row = []
        for val in row:
            err = validate_numeric(val, "x value")
            if err:
                return err
            x_row.append(float(val))
        x_matrix.append(x_row)

    # Convert to numpy arrays
    y_array = np.array(y_flat)
    x_array = np.array(x_matrix)

    # Set parameter names
    param_names = [f"x{i}" for i in range(num_cols_x)]

    # Fit the ordered logit model
    # Note: OrderedModel uses cut points (thresholds) instead of traditional intercepts.
    # The cut points are always estimated and capture what would be the intercept.
    # The fit_intercept parameter is kept for API consistency but has no effect.
    try:
        model = statsmodels_ordered_model(y_array, x_array, distr='logit')
        result = model.fit(disp=0, method='bfgs')
    except Exception as exc:  # noqa: BLE001
        return f"Model fitting error: {exc}"

    # Extract results
    output = [["parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper"]]

    # Get confidence intervals
    try:
        conf_int = result.conf_int(alpha=alpha)
    except Exception as exc:  # noqa: BLE001
        return f"Confidence interval error: {exc}"

    # Extract cut points (thresholds)
    params = result.params
    std_errors = result.bse
    z_stats = result.tvalues
    p_values = result.pvalues

    # Determine number of categories
    n_categories = len(set(y_flat))
    n_thresholds = n_categories - 1

    # Add threshold parameters
    for i in range(n_thresholds):
        param_name = f"cut_{i}/{i+1}"
        output.append([
            param_name,
            float(params[i]),
            float(std_errors[i]),
            float(z_stats[i]),
            float(p_values[i]),
            float(conf_int[i, 0]),
            float(conf_int[i, 1])
        ])

    # Add predictor parameters
    for i in range(n_thresholds, len(params)):
        param_idx = i - n_thresholds
        param_name = param_names[param_idx]
        output.append([
            param_name,
            float(params[i]),
            float(std_errors[i]),
            float(z_stats[i]),
            float(p_values[i]),
            float(conf_int[i, 0]),
            float(conf_int[i, 1])
        ])

    # Add model statistics
    output.append(["pseudo_r_squared", float(result.prsquared), "", "", "", "", ""])
    output.append(["log_likelihood", float(result.llf), "", "", "", "", ""])
    output.append(["aic", float(result.aic), "", "", "", "", ""])
    output.append(["bic", float(result.bic), "", "", "", "", ""])

    return output

Online Calculator